Search CORE

12 research outputs found

Orienting Ordered Scaffolds: Complexity and Algorithms

Author: Aganezov Sergey
Alekseyev Max A.
Alexeev Nikita
Avdeyev Pavel
Rong Yongwu
Publication venue
Publication date: 25/11/2019
Field of study

Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds along the genome chromosomes. Some of these methods (e.g., based on FISH physical mapping, chromatin conformation capture, etc.) can infer the order of scaffolds, but not necessarily their orientation. This leads to a special case of the scaffold orientation problem (i.e., deducing the orientation of each scaffold) with a known order of the scaffolds. We address the problem of orientating ordered scaffolds as an optimization problem based on given weighted orientations of scaffolds and their pairs (e.g., coming from pair-end sequencing reads, long reads, or homologous relations). We formalize this problem using notion of a scaffold graph (i.e., a graph, where vertices correspond to the assembled contigs or scaffolds and edges represent connections between them). We prove that this problem is NP-hard, and present a polynomial-time algorithm for solving its special case, where orientation of each scaffold is imposed relatively to at most two other scaffolds. We further develop an FPT algorithm for the general case of the OOS problem

arXiv.org e-Print Archive

The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms

Author: Agustinho Daniel Paiva
Aliyev Elbay
Avdeyev Pavel
Barrozo Enrico R.
Behera Sairam
Billingsley Kimberley
Busby Ben
Chen Guangyi
Chong Li Chuin
Choubey Deepak
Dabbaghie Fawaz
De Coster Wouter
Fu Yilei
Gener Alejandro R.
Hefferon Timothy
Henke David Morgan
Höps Wolfram
Illarionova Anastasia
Jochum Michael D.
Jose Maria
Kalra Divya
Kesharwani Rupesh K.
Khleifat Ahmad Al
Kolora Sree Rohit Raj
Kubica Jedrzej
Lakra Priya
Lattimer Damaris
Liew Chia-Sin
Lo Bai-Wei
Lo Chunhsuan
Lowdon Rebecca
Lötter Anneri
Mahmoud Medhat
Majidian Sina
Mendem Suresh Kumar
Molik David
Mondal Rajarshi
Ohmiya Hiroko
Parvin Nasrin
Paulin Luis F.
Peralta Carolina
Pfeifer Susanne P.
Poon Chi-Lam
Prabhakaran Ramanandan
Raza Muhammad Sohail
Saitou Marie
Sammi Aditi
Sanio Philippe
Sapoval Nicolae
Sedlazeck Fritz J
Soto Daniela C.
Syed Najeeb
Treangen Todd
Walker Kimberly
Wang Gaojianyong
Xu Tiancheng
Yang Jianzhi
Zhang Shangzhe
Zhou Weiyu
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2022
Field of study

publishedVersio

Brage NMBU

PubMed Central

UPSpace at the University of Pretoria

Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss

Author: Avdeyev Pavel
Publication venue: Health Sciences Research Commons
Publication date: 01/03/2016
Field of study

Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the “simplest” case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools

George Washington University: Health Sciences Research Commons (HSRC)

Implicit Transpositions in DCJ Scenarios

Author: Max A. Alekseyev
Pavel Avdeyev
Shuai Jiang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Genome rearrangements are large-scale evolutionary events that shuffle genomic architectures. The minimal number of such events between two genomes is often used in phylogenomic studies to measure the evolutionary distance between the genomes. Double-Cut-and-Join (DCJ) operations represent a convenient model of most common genome rearrangements (reversals, translocations, fissions, and fusions), while other genome rearrangements, such as transpositions, can be modeled by pairs of DCJs. Since the DCJ model does not directly account for transpositions, their impact on DCJ scenarios is unclear. In the present work, we study implicit appearance of transpositions (as pairs of DCJs) in DCJ scenarios. We consider shortest DCJ scenarios satisfying the maximum parsimony assumption, as well as more general DCJ scenarios based on some realistic but less restrictive assumptions. In both cases, we derive a uniform lower bound for the rate of implicit transpositions, which depends only on the genomes but not a particular DCJ scenario between them. Our results imply that implicit appearance of transpositions in DCJ scenarios may be unavoidable or even abundant for some pairs of genomes. We estimate that for mammalian genomes implicit transpositions constitute at least 6% of genome rearrangements

Directory of Open Access Journals

Frontiers - Publisher Connector

Linearization of Median Genomes Under the Double-Cut-and-Join-Indel Model.

Author: Alekseyev Max A
Avdeyev Pavel
Jiang Shuai
Publication venue: Health Sciences Research Commons
Publication date: 01/01/2019
Field of study

George Washington University: Health Sciences Research Commons (HSRC)

Biological computation and computational biology: survey, challenges, and discussion

Author: Avdeyev Pavel
Bayzid Md. Shamsuzzoha
Chelly Dagdia Zaineb
Publication venue: Springer Verlag
Publication date: 01/01/2021
Field of study

International audienceBiological computation involves the design and development of computational techniques inspired by natural biota. On the other hand, computational biology involves the development and application of computational techniques to study biological systems. We present a comprehensive review showcasing how biology and computer science can guide and benefit each other, resulting in improved understanding of biological processes and at the same time advances in the design of algorithms. Unfortunately, integration between biology and computer science is often challenging, especially due to the cultural idiosyncrasies of these two communities. In this study, we aim at highlighting how nature has inspired the development of various algorithms and techniques in computer science, and how computational techniques and mathematical modeling have helped to better understand various fields in biology. We identified existing gaps between biological computation and computational biology and advocate for bridging this gap between "wet" and "dry" research. The discussion in this paper about the challenges and importance of filling the gaps between the biological computation and computational biology communities represents the outcome of an analysis that has been done in the 5th Heidelberg Laureate Forum; specifically during a workshop https ://scilo gs.spekt rum.de/hlf/exper ience-learn-share-heide lberg-laure ate-forum / organized by Dr. Zaineb Chelly Dagdia and mentored by Professor Stephen Smale (Fields Medal awardee). After the workshop, a collaboration was formed between Dr. Zaineb Chelly Dagdia who works on biological computation and two participants and contributors to the workshop, Pavel Avdeyev and Dr. Md. Shamsuzzoha Bayzid, who work on different areas in computational biology

HAL UVSQ

A Unified ILP Framework for Core Ancestral Genome Reconstruction Problems.

Author: Alekseyev Max A
Alexeev Nikita
Avdeyev Pavel
Rong Yongwu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 14/02/2020
Field of study

George Washington University: Health Sciences Research Commons (HSRC)

Chromosome-level genome assembly, annotation, and phylogenomics of the gooseneck barnacle Pollicipes pollicipes

Author: Alexeev Nikita
Avdeyev Pavel
Bernot James P.
Crandall Keith A.
Dreyer Niklas
Pérez-Losada Marcos
Zamyatin Anton
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2022
Field of study

BACKGROUND: The barnacles are a group of \u3e2,000 species that have fascinated biologists, including Darwin, for centuries. Their lifestyles are extremely diverse, from free-swimming larvae to sessile adults, and even root-like endoparasites. Barnacles also cause hundreds of millions of dollars of losses annually due to biofouling. However, genomic resources for crustaceans, and barnacles in particular, are lacking. RESULTS: Using 62× Pacific Biosciences coverage, 189× Illumina whole-genome sequencing coverage, 203× HiC coverage, and 69× CHi-C coverage, we produced a chromosome-level genome assembly of the gooseneck barnacle Pollicipes pollicipes. The P. pollicipes genome is 770 Mb long and its assembly is one of the most contiguous and complete crustacean genomes available, with a scaffold N50 of 47 Mb and 90.5% of the BUSCO Arthropoda gene set. Using the genome annotation produced here along with transcriptomes of 13 other barnacle species, we completed phylogenomic analyses on a nearly 2 million amino acid alignment. Contrary to previous studies, our phylogenies suggest that the Pollicipedomorpha is monophyletic and sister to the Balanomorpha, which alters our understanding of barnacle larval evolution and suggests homoplasy in a number of naupliar characters. We also compared transcriptomes of P. pollicipes nauplius larvae and adults and found that nearly one-half of the genes in the genome are differentially expressed, highlighting the vastly different transcriptomes of larvae and adult gooseneck barnacles. Annotation of the genes with KEGG and GO terms reveals that these stages exhibit many differences including cuticle binding, chitin binding, microtubule motor activity, and membrane adhesion. CONCLUSION: This study provides high-quality genomic resources for a key group of crustaceans. This is especially valuable given the roles P. pollicipes plays in European fisheries, as a sentinel species for coastal ecosystems, and as a model for studying barnacle adhesion as well as its key position in the barnacle tree of life. A combination of genomic, phylogenetic, and transcriptomic analyses here provides valuable insights into the evolution and development of barnacles

PubMed Central

Copenhagen University Research Information System

George Washington University: Health Sciences Research Commons (HSRC)

Evaluation of haplotype callers for next-generation sequencing of viruses

Author: Alexeev Nikita
Avdeyev Pavel
Bendall Matthew L
Crandall Keith A
Eliseev Anton
Gibson Keylie M
Novik Dmitry
Perez-Losada Marcos
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2019
Field of study

Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies

PubMed Central

George Washington University: Health Sciences Research Commons (HSRC)

Comparative genomics meets topology: a novel view on genome median and halving problems

Author: A Zvonkin
C Zheng
C Zheng
F Hu
IP Goulden
J Erickson
J Harer
J Mixtacki
JE Andersen
JH Postlethwait
KM Swenson
M Haghighi
M Kellis
MA Alekseyev
MA Alekseyev
MA Alekseyev
Max A. Alekseyev
MDV Braga
N El-Mabrouk
Nikita Alexeev
NV Alexeev
P Avdeyev
P Compeau
P Dehal
P Feijão
Pavel Avdeyev
R Guyot
R Penner
R Warren
S Yancopoulos
U Haagerup
Y Gagnon
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref